Petabyte-scale innovations at the European Nucleotide Archive

نویسندگان

  • Guy Cochrane
  • Ruth Akhtar
  • James K. Bonfield
  • Lawrence Bower
  • Fehmi Demiralp
  • Nadeem Faruque
  • Richard Gibson
  • Gemma Hoad
  • Tim J. P. Hubbard
  • Christopher Hunter
  • Mikyung Jang
  • Szilveszter Juhos
  • Rasko Leinonen
  • Steven Leonard
  • Quan Lin
  • Rodrigo Lopez
  • Dariusz Lorenc
  • Hamish McWilliam
  • Gaurab Mukherjee
  • Sheila Plaister
  • Rajesh Radhakrishnan
  • Stephen Robinson
  • Siamak Sobhany
  • Petra ten Hoopen
  • Robert Vaughan
  • Vadim Zalunin
  • Ewan Birney
چکیده

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

T-Archive: A Novel HSM-Based Data Archive System

Rapid increases of user data from terabytes to petabytes have created new challenges in data archiving. Modern data archive systems require higher adaptivity, reliability, and performance than traditional data archive systems can provide. Recently Hierarchical Storage Management (HSM) has been applied to a data archive that stores data in a multi-level storage system according to access frequen...

متن کامل

Target for LOFAR Long Term Archive: Architecture and Implementation

The LOFAR Long-Term Archive (LTA) is a multi-Petabyte scale data storage for the processed data of LOFAR telescope. We describe the adaptation of the WISE concept implemented by Target consortium for the LOFAR LTA and changes we introduced to it to accommodate LOFAR data. This paper describes an example of a new information system created on the basis of Astro-WISE for a wider range and scale o...

متن کامل

PROBA-V Mission Exploitation Platform

As an extension of the PROBA-Vegetation (PROBA-V) user segment, the European Space Agency (ESA), de Vlaamse Instelling voor Technologisch Onderzoek (VITO), and partners TRASYS and Spacebel developed an operational Mission Exploitation Platform (MEP) to drastically improve the exploitation of the PROBA-V Earth Observation (EO) data archive, the archive from the historical SPOT-VEGETATION mission...

متن کامل

Pergamum : energy - efficient archival storage with disk instead of tape

Dr. Ethan L. Miller is an associate professor of computer science at the University of California, Santa Cruz, where he is a member of the Storage Systems Research Center (SSRC). His current research projects, which are funded by the NSF, Department of Energy, and industry support for the SSRC, include long-term archival storage systems, scalable metadata and indexing, issues in petabyte-scale ...

متن کامل

Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database

The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2009